MersV1, Main, Exploration, bibRecord, 000972

FSH: fast spaced seed hashing exploiting adjacent hashes

Identifieur interne : 000972 ( Main/Exploration ); précédent : 000971; suivant : 000973

FSH: fast spaced seed hashing exploiting adjacent hashes

Auteurs : Samuele Girotto ; Matteo Comin ; Cinzia Pizzi

Source :

Algorithms for Molecular Biology : AMB [ 1748-7188 ] ; 2018.

RBID : PMC:5863468

Abstract

Background

Patterns with wildcards in specified positions, namely spaced seeds, are increasingly used instead of k-mers in many bioinformatics applications that require indexing, querying and rapid similarity search, as they can provide better sensitivity. Many of these applications require to compute the hashing of each position in the input sequences with respect to the given spaced seed, or to multiple spaced seeds. While the hashing of k-mers can be rapidly computed by exploiting the large overlap between consecutive k-mers, spaced seeds hashing is usually computed from scratch for each position in the input sequence, thus resulting in slower processing.

Results

The method proposed in this paper, fast spaced-seed hashing (FSH), exploits the similarity of the hash values of spaced seeds computed at adjacent positions in the input sequence. In our experiments we compute the hash for each positions of metagenomics reads from several datasets, with respect to different spaced seeds. We also propose a generalized version of the algorithm for the simultaneous computation of multiple spaced seeds hashing. In the experiments, our algorithm can compute the hashing values of spaced seeds with a speedup, with respect to the traditional approach, between 1.6\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}× to 5.3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}×, depending on the structure of the spaced seed.

Conclusions

Spaced seed hashing is a routine task for several bioinformatics application. FSH allows to perform this task efficiently and raise the question of whether other hashing can be exploited to further improve the speed up. This has the potential of major impact in the field, making spaced seed applications not only accurate, but also faster and more efficient.

Availability

The software FSH is freely available for academic use at: https://bitbucket.org/samu661/fsh/overview.

Url:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5863468

DOI: 10.1186/s13015-018-0125-4
PubMed: 29588651
PubMed Central: 5863468

Affiliations:

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">FSH: fast spaced seed hashing exploiting adjacent hashes</title>
<author><name sortKey="Girotto, Samuele" sort="Girotto, Samuele" uniqKey="Girotto S" first="Samuele" last="Girotto">Samuele Girotto</name>
<affiliation><nlm:aff id="Aff1"></nlm:aff>
</affiliation>
</author>
<author><name sortKey="Comin, Matteo" sort="Comin, Matteo" uniqKey="Comin M" first="Matteo" last="Comin">Matteo Comin</name>
<affiliation><nlm:aff id="Aff1"></nlm:aff>
</affiliation>
</author>
<author><name sortKey="Pizzi, Cinzia" sort="Pizzi, Cinzia" uniqKey="Pizzi C" first="Cinzia" last="Pizzi">Cinzia Pizzi</name>
<affiliation><nlm:aff id="Aff1"></nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">29588651</idno>
<idno type="pmc">5863468</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5863468</idno>
<idno type="RBID">PMC:5863468</idno>
<idno type="doi">10.1186/s13015-018-0125-4</idno>
<date when="2018">2018</date>
<idno type="wicri:Area/Pmc/Corpus">000248</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000248</idno>
<idno type="wicri:Area/Pmc/Curation">000248</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000248</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000579</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000579</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="RBID">pubmed:29588651</idno>
<idno type="wicri:Area/PubMed/Corpus">000955</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000955</idno>
<idno type="wicri:Area/PubMed/Curation">000955</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000955</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000917</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000917</idno>
<idno type="wicri:Area/Ncbi/Merge">001D78</idno>
<idno type="wicri:Area/Ncbi/Curation">001D78</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">001D78</idno>
<idno type="wicri:Area/Main/Merge">000975</idno>
<idno type="wicri:Area/Main/Curation">000972</idno>
<idno type="wicri:Area/Main/Exploration">000972</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">FSH: fast spaced seed hashing exploiting adjacent hashes</title>
<author><name sortKey="Girotto, Samuele" sort="Girotto, Samuele" uniqKey="Girotto S" first="Samuele" last="Girotto">Samuele Girotto</name>
<affiliation><nlm:aff id="Aff1"></nlm:aff>
</affiliation>
</author>
<author><name sortKey="Comin, Matteo" sort="Comin, Matteo" uniqKey="Comin M" first="Matteo" last="Comin">Matteo Comin</name>
<affiliation><nlm:aff id="Aff1"></nlm:aff>
</affiliation>
</author>
<author><name sortKey="Pizzi, Cinzia" sort="Pizzi, Cinzia" uniqKey="Pizzi C" first="Cinzia" last="Pizzi">Cinzia Pizzi</name>
<affiliation><nlm:aff id="Aff1"></nlm:aff>
</affiliation>
</author>
</analytic>
<series><title level="j">Algorithms for Molecular Biology : AMB</title>
<idno type="eISSN">1748-7188</idno>
<imprint><date when="2018">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><sec><title>Background</title>
<p>Patterns with wildcards in specified positions, namely <italic>spaced seeds</italic>
, are increasingly used instead of <italic>k</italic>
-mers in many bioinformatics applications that require indexing, querying and rapid similarity search, as they can provide better sensitivity. Many of these applications require to compute the hashing of each position in the input sequences with respect to the given spaced seed, or to multiple spaced seeds. While the hashing of <italic>k</italic>
-mers can be rapidly computed by exploiting the large overlap between consecutive <italic>k</italic>
-mers, spaced seeds hashing is usually computed from scratch for each position in the input sequence, thus resulting in slower processing.</p>
</sec>
<sec><title>Results</title>
<p> The method proposed in this paper, fast spaced-seed hashing (FSH), exploits the similarity of the hash values of spaced seeds computed at adjacent positions in the input sequence. In our experiments we compute the hash for each positions of metagenomics reads from several datasets, with respect to different spaced seeds. We also propose a generalized version of the algorithm for the simultaneous computation of multiple spaced seeds hashing. In the experiments, our algorithm can compute the hashing values of spaced seeds with a speedup, with respect to the traditional approach, between 1.6<inline-formula id="IEq1"><alternatives><tex-math id="M1">\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\times$$\end{document}</tex-math>
<mml:math id="M2"><mml:mo>×</mml:mo>
</mml:math>
<inline-graphic xlink:href="13015_2018_125_Article_IEq1.gif"></inline-graphic>
</alternatives>
</inline-formula>
 to 5.3<inline-formula id="IEq2"><alternatives><tex-math id="M3">\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\times$$\end{document}</tex-math>
<mml:math id="M4"><mml:mo>×</mml:mo>
</mml:math>
<inline-graphic xlink:href="13015_2018_125_Article_IEq2.gif"></inline-graphic>
</alternatives>
</inline-formula>
, depending on the structure of the spaced seed.</p>
</sec>
<sec><title>Conclusions</title>
<p>Spaced seed hashing is a routine task for several bioinformatics application. FSH allows to perform this task efficiently and raise the question of whether other hashing can be exploited to further improve the speed up. This has the potential of major impact in the field, making spaced seed applications not only accurate, but also faster and more efficient.</p>
</sec>
<sec><title>Availability</title>
<p>The software FSH is freely available for academic use at: <ext-link ext-link-type="uri" xlink:href="https://bitbucket.org/samu661/fsh/overview">https://bitbucket.org/samu661/fsh/overview</ext-link>
.</p>
</sec>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author><name sortKey="Gish, W" uniqKey="Gish W">W Gish</name>
</author>
<author><name sortKey="Miller, W" uniqKey="Miller W">W Miller</name>
</author>
<author><name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
<author><name sortKey="Lipman, Dj" uniqKey="Lipman D">DJ Lipman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Buhler, J" uniqKey="Buhler J">J Buhler</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ma, B" uniqKey="Ma B">B Ma</name>
</author>
<author><name sortKey="Tromp, J" uniqKey="Tromp J">J Tromp</name>
</author>
<author><name sortKey="Li, M" uniqKey="Li M">M Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Comin, M" uniqKey="Comin M">M Comin</name>
</author>
<author><name sortKey="Antonello, M" uniqKey="Antonello M">M Antonello</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Comin, M" uniqKey="Comin M">M Comin</name>
</author>
<author><name sortKey="Leoni, A" uniqKey="Leoni A">A Leoni</name>
</author>
<author><name sortKey="Schimd, M" uniqKey="Schimd M">M Schimd</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Girotto, S" uniqKey="Girotto S">S Girotto</name>
</author>
<author><name sortKey="Pizzi, C" uniqKey="Pizzi C">C Pizzi</name>
</author>
<author><name sortKey="Comin, M" uniqKey="Comin M">M Comin</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ounit, R" uniqKey="Ounit R">R Ounit</name>
</author>
<author><name sortKey="Wanamaker, S" uniqKey="Wanamaker S">S Wanamaker</name>
</author>
<author><name sortKey="Close, Tj" uniqKey="Close T">TJ Close</name>
</author>
<author><name sortKey="Lonardi, S" uniqKey="Lonardi S">S Lonardi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Pizzi, C" uniqKey="Pizzi C">C Pizzi</name>
</author>
<author><name sortKey="Ukkonen, E" uniqKey="Ukkonen E">E Ukkonen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Parida, L" uniqKey="Parida L">L Parida</name>
</author>
<author><name sortKey="Pizzi, C" uniqKey="Pizzi C">C Pizzi</name>
</author>
<author><name sortKey="Rombo, Se" uniqKey="Rombo S">SE Rombo</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Shajii, A" uniqKey="Shajii A">A Shajii</name>
</author>
<author><name sortKey="Yorukoglu, D" uniqKey="Yorukoglu D">D Yorukoglu</name>
</author>
<author><name sortKey="William Yu, Y" uniqKey="William Yu Y">Y William Yu</name>
</author>
<author><name sortKey="Berger, B" uniqKey="Berger B">B Berger</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Darling, Ae" uniqKey="Darling A">AE Darling</name>
</author>
<author><name sortKey="Treangen, Tj" uniqKey="Treangen T">TJ Treangen</name>
</author>
<author><name sortKey="Zhang, L" uniqKey="Zhang L">L Zhang</name>
</author>
<author><name sortKey="Kuiken, C" uniqKey="Kuiken C">C Kuiken</name>
</author>
<author><name sortKey="Messeguer, X" uniqKey="Messeguer X">X Messeguer</name>
</author>
<author><name sortKey="Perna, Nt" uniqKey="Perna N">NT Perna</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Rumble, Sm" uniqKey="Rumble S">SM Rumble</name>
</author>
<author><name sortKey="Lacroute, P" uniqKey="Lacroute P">P Lacroute</name>
</author>
<author><name sortKey="Dalca, Av" uniqKey="Dalca A">AV Dalca</name>
</author>
<author><name sortKey="Fiume, M" uniqKey="Fiume M">M Fiume</name>
</author>
<author><name sortKey="Sidow, A" uniqKey="Sidow A">A Sidow</name>
</author>
<author><name sortKey="Brudno, M" uniqKey="Brudno M">M Brudno</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Leimeister, C A" uniqKey="Leimeister C">C-A Leimeister</name>
</author>
<author><name sortKey="Boden, M" uniqKey="Boden M">M Boden</name>
</author>
<author><name sortKey="Horwege, S" uniqKey="Horwege S">S Horwege</name>
</author>
<author><name sortKey="Lindner, S" uniqKey="Lindner S">S Lindner</name>
</author>
<author><name sortKey="Morgenstern, B" uniqKey="Morgenstern B">B Morgenstern</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Bainda, K" uniqKey="Bainda K">K Bainda</name>
</author>
<author><name sortKey="Sykulski, M" uniqKey="Sykulski M">M Sykulski</name>
</author>
<author><name sortKey="Kucherov, G" uniqKey="Kucherov G">G Kucherov</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Girotto, S" uniqKey="Girotto S">S Girotto</name>
</author>
<author><name sortKey="Comin, M" uniqKey="Comin M">M Comin</name>
</author>
<author><name sortKey="Pizzi, C" uniqKey="Pizzi C">C Pizzi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ounit, R" uniqKey="Ounit R">R Ounit</name>
</author>
<author><name sortKey="Lonardi, S" uniqKey="Lonardi S">S Lonardi</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Hahn, L" uniqKey="Hahn L">L Hahn</name>
</author>
<author><name sortKey="Leimeister, C A" uniqKey="Leimeister C">C-A Leimeister</name>
</author>
<author><name sortKey="Ounit, R" uniqKey="Ounit R">R Ounit</name>
</author>
<author><name sortKey="Lonardi, S" uniqKey="Lonardi S">S Lonardi</name>
</author>
<author><name sortKey="Morgenstern, B" uniqKey="Morgenstern B">B Morgenstern</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ilie, L" uniqKey="Ilie L">L Ilie</name>
</author>
<author><name sortKey="Ilie, S" uniqKey="Ilie S">S Ilie</name>
</author>
<author><name sortKey="Mansouri Bigvand, A" uniqKey="Mansouri Bigvand A">A Mansouri Bigvand</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ma, B" uniqKey="Ma B">B Ma</name>
</author>
<author><name sortKey="Li, M" uniqKey="Li M">M Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Brown, Dg" uniqKey="Brown D">DG Brown</name>
</author>
<author><name sortKey="Li, M" uniqKey="Li M">M Li</name>
</author>
<author><name sortKey="Ma, B" uniqKey="Ma B">B Ma</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Mohamadi, H" uniqKey="Mohamadi H">H Mohamadi</name>
</author>
<author><name sortKey="Chu, J" uniqKey="Chu J">J Chu</name>
</author>
<author><name sortKey="Vandervalk, Bp" uniqKey="Vandervalk B">BP Vandervalk</name>
</author>
<author><name sortKey="Birol, I" uniqKey="Birol I">I Birol</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Lindgreen, S" uniqKey="Lindgreen S">S Lindgreen</name>
</author>
<author><name sortKey="Adair, Kl" uniqKey="Adair K">KL Adair</name>
</author>
<author><name sortKey="Gardner, P" uniqKey="Gardner P">P Gardner</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Keich, U" uniqKey="Keich U">U Keich</name>
</author>
<author><name sortKey="Li, M" uniqKey="Li M">M Li</name>
</author>
<author><name sortKey="Ma, B" uniqKey="Ma B">B Ma</name>
</author>
<author><name sortKey="Tromp, J" uniqKey="Tromp J">J Tromp</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Wood, De" uniqKey="Wood D">DE Wood</name>
</author>
<author><name sortKey="Salzberg, Sl" uniqKey="Salzberg S">SL Salzberg</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations><list></list>
<tree><noCountry><name sortKey="Comin, Matteo" sort="Comin, Matteo" uniqKey="Comin M" first="Matteo" last="Comin">Matteo Comin</name>
<name sortKey="Girotto, Samuele" sort="Girotto, Samuele" uniqKey="Girotto S" first="Samuele" last="Girotto">Samuele Girotto</name>
<name sortKey="Pizzi, Cinzia" sort="Pizzi, Cinzia" uniqKey="Pizzi C" first="Cinzia" last="Pizzi">Cinzia Pizzi</name>
</noCountry>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000972 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000972 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     PMC:5863468
   |texte=   FSH: fast spaced seed hashing exploiting adjacent hashes
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:29588651" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021

Serveur d'exploration MERS

FSH: fast spaced seed hashing exploiting adjacent hashes

FSH: fast spaced seed hashing exploiting adjacent hashes

Source :

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki

	Serveur d'exploration MERS
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.